Compressed basis GMRES on high-performance graphics processing units
نویسندگان
چکیده
Krylov methods provide a fast and highly parallel numerical tool for the iterative solution of many large-scale sparse linear systems. To large extent, performance practical realizations these is constrained by communication bandwidth in current computer architectures, motivating investigation sophisticated techniques to avoid, reduce, and/or hide message-passing costs (in distributed platforms) memory accesses all architectures). This article leverages Ginkgo’s accessor order integrate communication-reduction strategy into (Krylov) GMRES solver that decouples storage format (i.e., data representation memory) orthogonal basis from arithmetic precision employed during operations with basis. Given execution time largely determined accesses, cost datatype transforms can be mostly hidden, resulting acceleration step via decrease volume bits being retrieved memory. Together special properties orthonormal (whose elements are bounded 1), this paves road toward aggressive customization format, which includes some floating-point as well fixed-point formats mild impact on convergence process. We develop high-performance implementation “compressed GMRES” Ginkgo algebra library using set test problems SuiteSparse Matrix Collection. demonstrate robustness advantages modern NVIDIA V100 graphics processing unit (GPU) up 50% over standard stores IEEE double-precision.
منابع مشابه
High-Performance Pseudo-Random Number Generation on Graphics Processing Units
This work considers the deployment of pseudo-random number generators (PRNGs) on graphics processing units (GPUs), developing an approach based on the xorgens generator to rapidly produce pseudo-random numbers of high statistical quality. The chosen algorithm has configurable state size and period, making it ideal for tuning to the GPU architecture. We present a comparison of both speed and sta...
متن کاملAlgorithmic performance studies on graphics processing units
We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floatingpoint co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear interior-point optimization. Since a full re-implementation of these complex kernels is typically not feasible, we identify the matrix-matrix multi...
متن کاملCofactorization on Graphics Processing Units
We show how the cofactorization step, a compute-intensive part of the relation collection phase of the number field sieve (NFS), can be farmed out to a graphics processing unit. Our implementation on a GTX 580 GPU, which is integrated with a state-of-the-art NFS implementation, can serve as a cryptanalytic co-processor for several Intel i7-3770K quad-core CPUs simultaneously. This allows those ...
متن کاملGraphics Processing Units and High-Dimensional Optimization.
This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of d...
متن کاملHigh Performance Direct Gravitational N - body Simulations on Graphics Processing Units An implementation in CUDA
At the end of 2006 NVIDIA introduced a new generation of graphical processing units (GPUs) (the so called G80 architecture). These GPUs are more powerful than any of the GPUs released before; they offer up to 350 billion floating-point operations per second (GFLOP/s) in certain situations. With the introduction of this hardware NVIDIA released a new programming environment that makes it easier ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of High Performance Computing Applications
سال: 2022
ISSN: ['1741-2846', '1094-3420']
DOI: https://doi.org/10.1177/10943420221115140